为Chromium浏览器内核贡献源码！- GSoC 2025 Proposal - Interaction to Next Paint (INP) subparts

gy2025-04-152025-05-21

最近有幸参与并入选了GSoC 2025，这篇文章是我给Chromium项目组的提案，针对Web Performance API，旨在优化浏览器内核对于INP性能指标的一些处理。

小红书📕👉【CQU大二成功被选为Google GSoC贡献者 - g122622 | 小红书 - 你的生活指南】 😆 1zhytSvgu5XAp9W 😆 https://www.xiaohongshu.com/discovery/item/681e22300000000023011448?source=webshare&xhsshare=pc_web&xsec_token=ABqekbzfvoxIozXy36qJLYrAPjDUL5wByx4W95rEowRts=&xsec_source=pc_share

Pasted%20image%2020250406205948

Overview & Backgrounds

Brief introduction about me

I’m a sophomore studying Computer Science at Chongqing University, China. I have been self-studying computer science since I was 9 years old and have nearly four years of experience researching the principles and design philosophies behind the Chromium project.

Recently, I happened to be studying Web performance standards. I was very excited when I came across this project on GSoC; it feels like it was meant to be!☺️

The present situation of INP

Pasted%20image%2020250406164034

The newly added INP metric was incorporated into the CWV system last year (2024). This year, the Chromium team plans to assist developers in conducting an in-depth analysis of INP latency issues by introducing a reporting mechanism for “subparts” similar to that of the LCP metric. This approach will break down INP into sub-dimensions such as input delay, processing time, and presentation delay.

The Chromium team has already implemented preliminary measurements of these timings within the Event Timing API.

By modifying Chromium’s Event Timing API, fine-grained timestamps (including but not limited to timeStamp, processingStart, processingEnd) that originally existed only in the Renderer process are transmitted to the Browser process via Mojo IPC, and ultimately integrated into the UKM metric system and CrUX experimental dataset.

My understanding of how EventTiming works

This is my overall understanding of how EventTiming operates:

The EventTiming module is located in the third_party/blink/renderer/core/timing/ directory.

The EventTiming::TryCreate() is the entry point for event capturing and is primarily invoked in the following scenarios:

Pasted%20image%2020250331120614

EventDispatcher::Dispatch()
  ├─ EventTiming::TryCreate() // Create EventTiming object (whether successful or not is transparent to subsequent code)
      ├─ DOMWindowPerformance::performance()
      ├─ EventTiming::processing_start = Now(); // Get processing start time
      ├─ EventTiming::HandleInputDelay(..., processing_start) // Obtain some performance data and report it (such as FID)
      └─ EventTiming::EventTiming(processing_start, ...)
          └─ performance_->EventTimingProcessingStart(..., processing_start, ...)
          └─ // ↑ Record the event's creation_time and processing_start
  ├─ target->DispatchEvent() // Trigger event callbacks
      └─ v8::Function::Call() // Call V8 to execute the callback function. The C++ call stack is blocked here waiting for the JS call stack to return
  └─ ~EventTiming::EventTiming() // RAII takes effect, automatically calling the destructor
      └─ WindowPerformance::EventTimingProcessingEnd()
      ├─ // ↑ Record processing_end, and trigger data reporting through EnsureSendTimer
      [IPC callback]
      └─ WindowPerformance::ReportEventTimings(); // Reporting metrics like INP

The EventTiming adopts a non-intrusive design, making its destruction transparent to the main event dispatch flow. It automatically records the processing end time at destruction through C++’s RAII mechanism:

// event_timing.cc
EventTiming::~EventTiming() {
    if (event_) {
        performance_->EventTimingProcessingEnd(*event_, Now());
    }
}

By the way, this design approach also leverages the LIFO nature of the stack to handle nested events (such as an input event triggered within a pointer event), ensuring that Start and End calls are sequenced correctly.

Pasted%20image%2020250406162629

The event dispatch process target->DispatchEvent(*pointer_event) will ultimately call EventTarget::FireEventListeners, which iterates over the obtained event_target vector table and invokes each event callback one by one.

If the listener is a JavaScript function (JSBasedEventListener), V8 is used to execute the JS callback here.
If the listener is a native C++ object (such as built-in event handlers), it directly calls the C++ method.

For the former case (JS listeners), in JSBasedEventListener::Invoke, triggering the JS callback is fully synchronous:

1
2
3

v8::TryCatch try_catch(isolate);
try_catch.SetVerbose(true);
InvokeInternal(*event->currentTarget(), *event, js_event); // Synchronously call JS function

InvokeInternal eventually executes the JS function through V8’s v8::Function::Call(). The C++ main thread waits synchronously here for the JS function to finish execution, during which it is blocked and unable to handle other tasks.

The procedure of passing INP data from the render process to the browser process：

Pasted%20image%2020250406183937

This class diagram illustrates : The event handling process starting from WindowPerformance ➡️ Metric calculations performed by ResponsivenessMetrics ➡️ Step-by-step reporting through the framework client layer ➡️ Data aggregation by PageTimingMetricsSender ultimately ➡️ Cross-process transmission using mojom structures

(the direction of the arrows indicates the direction of calls/data flow)

Tasks

1.Update PageLoadMetricsSender mojo struct

Mojo is a cross-platform IPC framework that was born out of Chromium to facilitate intra-process and inter-process communication within Chromium. I am very familiar with its principles.

Pasted%20image%2020250326221427

Pasted%20image%2020250326220537

Change to report each part of each event timing (with non-0 interactionID), rather than just a single duration value

The general idea: I think we can refer to some fields in the following C++ structure in performance_event_timing.h to customize the PageLoadMetricsSender.😉

// third_party/blink/renderer/core/timing/performance_event_timing.h

struct EventTimingReportingInfo {
    uint64_t presentation_index;  // Presentation index
    base::TimeTicks creation_time;  // Time when the event was created
    base::TimeTicks enqueued_to_main_thread_time;  // Time when it was enqueued to the main thread
    base::TimeTicks processing_start_time;  // Time when processing started
    base::TimeTicks processing_end_time;  // Time when processing ended
    base::TimeTicks commit_finish_time;  // Time when rendering commit finished
    base::TimeTicks presentation_time;  // Time of presentation
    base::TimeTicks fallback_time;  // Fallback time
    base::TimeTicks render_start_time;  // Time when rendering started
    std::optional<int> key_code;  // Key code (for keyboard events)
    std::optional<PointerId> pointer_id;  // Pointer ID (for pointer events)
    bool prevent_counting_as_interaction;  // Whether to prevent counting as an interaction
    bool is_processing_fully_nested_in_another_event;  // Whether the processing is fully nested within another event
};

Therefore, based on the above C++ code, the following metrics can be derived:
(This table clearly outlines each metric’s details. The metric I highlighted in bold is the most important 🚧)

Metric Name	Calculation Formula	Description
Enqueue Delay（Input Delay）	`enqueued_to_main_thread_time - creation_time`	Measures the delay from when an event is created to when it is added to the main thread queue. This can help identify if events are experiencing long waits before being processed.
Processing Delay	`processing_start_time - enqueued_to_main_thread_time`	The time difference from when an event is added to the main thread queue to when processing begins. It reflects the impact of the main thread’s busyness on event handling.
Processing Duration	`processing_end_time - processing_start_time`	The amount of time spent processing the event. This helps in understanding the efficiency of the event processing logic.
Render Preparation Time	`render_start_time - processing_end_time`	The duration from the end of event processing to the start of rendering. It may include the time required for style calculations, layout, and other preparatory work before rendering.
Render Duration	`commit_finish_time - render_start_time`	The actual time spent during the rendering process, from the start of rendering until the commit finishes.
Presentation Delay	`presentation_time - commit_finish_time`	The delay from when the commit finishes to when it is finally presented to the user. This part can reveal bottlenecks in the composition and display processes.
Total Interaction to Presentation Time	N/A (Overall Time)	This is the core metric of INP (Interaction to Next Paint), measuring the total time from event creation (`creation_time`) to presentation to the user (`presentation_time`). It can be used to directly assess the user’s interaction experience.

Below is original Code in components/page_load_metrics/common/page_load_metrics.mojom :

// components/page_load_metrics/common/page_load_metrics.mojom

// Metrics about general input delay.
struct InputTiming {

  // The number of user interactions, including click, tap and key press.
  uint64 num_interactions = 0;

  // List of user interactions since last update.
  // TODO(crbug.com/382949422): Remove the needless union wrapper.
  UserInteractionLatencies max_event_durations;
};

// Data for user interaction latencies which can be meausred in different ways.
union UserInteractionLatencies {
  array<UserInteractionLatency> user_interaction_latencies;
};

// The latency and the type of a user interaction.
struct UserInteractionLatency {
  mojo_base.mojom.TimeDelta interaction_latency;
  // The one-based offset of the interaction in time-based order; 1 for the
  // first interaction, 2 for the second, etc.
  uint64 interaction_offset;
  // The time the interaction occurred, relative to navigation start.
  mojo_base.mojom.TimeTicks interaction_time;
};

Modified Code :

// (modified by me) components/page_load_metrics/common/page_load_metrics.mojom

// New subpart data structure added
struct InteractionSubpartTiming {
  // Input delay: Time from event trigger to start processing
  mojo_base.mojom.TimeDelta input_delay;
  // Processing time: Duration taken by listener execution
  mojo_base.mojom.TimeDelta processing_time;
  // Presentation delay: Rendering pipeline duration
  mojo_base.mojom.TimeDelta presentation_delay;
  // Compatibility field: Total delay should equal the sum of the three parts
  // @deprecated Will be removed after gradual migration
  mojo_base.mojom.TimeDelta total_duration;
};

// Modify user interaction latency structure
struct UserInteractionLatency {
  // Retain original fields
  mojo_base.mojom.TimeDelta interaction_latency; // Retained during transition period
  uint64 interaction_offset;
  mojo_base.mojom.TimeTicks interaction_time;
  
  // Add subpart data
  InteractionSubpartTiming subparts;
  
  // Add interaction type identifier
  // 0: click/tap, 1: keyboard, 2: drag etc.
  uint8 interaction_type; 
};

// Optimize union structure
union UserInteractionLatencies {
  // Upgrade to an array carrying subpart data
  array<UserInteractionLatency> detailed_interactions;
  
  // Retain old format for compatibility during transition period
  array<mojo_base.mojom.TimeDelta> legacy_durations; 
};

// Enhance input timing structure
struct InputTiming {
  // Split into basic statistics and detailed data
  uint64 num_interactions;
  UserInteractionLatencies interaction_details;
  
  // Add performance baseline identifier
  // 0: Uncalibrated, 1: Lab environment, 2: Real user data
  uint8 measurement_quality;
};

I added InteractionSubpartTiming struct specifically to carry subpart timing data, while retaining existing UserInteractionLatency as a container struct, nesting new data through the subparts field

Update blink/Renderer side “plumbing” to migrate these values from window_performance.cc / responsiveness_metrics.cc to the PageTimingMetricsSender.

I am planning to move the logic into the function below, using my new mojo struct to delivery data.

Pasted%20image%2020250406184550

// components/page_load_metrics/renderer/page_timing_metrics_sender.cc
// this is how we eventually sent from Renderer to Browser
void PageTimingMetricsSender::DidObserveUserInteraction(
    base::TimeTicks max_event_start,
    base::TimeTicks max_event_queued_main_thread,
    base::TimeTicks max_event_commit_finish,
    base::TimeTicks max_event_end,
    uint64_t interaction_offset) {
  input_timing_delta_->num_interactions++;
  metadata_recorder_.AddInteractionDurationMetadata(max_event_start,
                                                    max_event_end);
  metadata_recorder_.AddInteractionDurationAfterQueueingMetadata(
      max_event_start, max_event_queued_main_thread, max_event_commit_finish,
      max_event_end);
  base::TimeDelta max_event_duration = max_event_end - max_event_start;
  input_timing_delta_->max_event_durations->get_user_interaction_latencies()
      .emplace_back(mojom::UserInteractionLatency::New(
          max_event_duration, interaction_offset, max_event_start));
  EnsureSendTimer();
}

2.From Browser, update the UkmPageLoadMetricsObserver

From Browser, update the UkmPageLoadMetricsObserver:

Change PageLoadMetricsUpdateDispatcher::UpdatePageInputTiming and ResponsivenessMetricsNormalization helper to use this new mojo struct format in order to re-implement the existing UKM reporting of INP and NumInteractions.

(There are other PageLoadMetricsObserver types which may require similar updates.)

For the changes in the UserInteractionLatency structure, I have provided the following scheme, implementing compatibility handling for the old and new data formats using the strategy pattern.😊

Original code :

// components/page_load_metrics/browser/responsiveness_metrics_normalization.cc

std::optional<mojom::UserInteractionLatency>
ResponsivenessMetricsNormalization::ApproximateHighPercentile() const {
  std::optional<mojom::UserInteractionLatency> approximate_high_percentile;
  if (worst_ten_latencies_.size()) {
    uint64_t index =
        std::min(static_cast<uint64_t>(worst_ten_latencies_.size() - 1),
                 static_cast<uint64_t>(num_user_interactions_ /
                                       kHighPercentileUpdateFrequency));
    approximate_high_percentile = worst_ten_latencies_[index];
  }
  return approximate_high_percentile;
}

std::optional<mojom::UserInteractionLatency>
ResponsivenessMetricsNormalization::worst_latency() const {
  std::optional<mojom::UserInteractionLatency> worst_latency;
  if (worst_ten_latencies_.size()) {
    worst_latency = worst_ten_latencies_[0];
  }
  return worst_latency;
}

void ResponsivenessMetricsNormalization::AddNewUserInteractionLatencies(
    uint64_t num_new_interactions,
    const mojom::UserInteractionLatencies& max_event_durations) {
  num_user_interactions_ += num_new_interactions;
  // Normalize max event durations.
  NormalizeUserInteractionLatencies(max_event_durations);
}

void ResponsivenessMetricsNormalization::ClearAllUserInteractionLatencies() {
  num_user_interactions_ = 0;
  worst_ten_latencies_ = std::vector<mojom::UserInteractionLatency>();
}

Modified code :

// components/page_load_metrics/browser/responsiveness_metrics_normalization.cc

// Add subpart delay calculation strategy
namespace {

using InteractionLatency = mojom::UserInteractionLatency;

// Abstract strategy interface
class LatencyCalculator {
 public:
  virtual base::TimeDelta GetTotalLatency(const InteractionLatency& latency) const = 0;
  virtual ~LatencyCalculator() = default;
};

// New data format strategy
class SubpartsLatencyCalculator : public LatencyCalculator {
 public:
  base::TimeDelta GetTotalLatency(const InteractionLatency& latency) const override {
    return latency->subparts->input_delay + 
           latency->subparts->processing_time +
           latency->subparts->presentation_delay;
  }
};

// Old data format strategy
class LegacyLatencyCalculator : public LatencyCalculator {
 public:
  base::TimeDelta GetTotalLatency(const InteractionLatency& latency) const override {
    return latency->interaction_latency;
  }
};

}  // namespace

// Modify key functions
std::optional<InteractionLatency>
ResponsivenessMetricsNormalization::ApproximateHighPercentile() const {
  if (worst_ten_latencies_.empty()) return std::nullopt;

  const uint64_t index = std::min<uint64_t>(
      worst_ten_latencies_.size() - 1,
      num_user_interactions_ / kHighPercentileUpdateFrequency);
  
  return ApplyLatencySelectionStrategy(worst_ten_latencies_[index]);
}

std::optional<InteractionLatency>
ResponsivenessMetricsNormalization::worst_latency() const {
  return worst_ten_latencies_.empty() 
      ? std::nullopt 
      : ApplyLatencySelectionStrategy(worst_ten_latencies_[0]);
}

void ResponsivenessMetricsNormalization::AddNewUserInteractionLatencies(
    uint64_t num_new_interactions,
    const mojom::UserInteractionLatencies& max_event_durations) {
  num_user_interactions_ += num_new_interactions;

  // Strategy pattern selector
  auto strategy = max_event_durations.is_detailed_interactions()
      ? std::make_unique<SubpartsLatencyCalculator>()
      : std::make_unique<LegacyLatencyCalculator>();

  // Unified processing interface
  auto processor = [this, &strategy](const auto& interactions) {
    for (const auto& latency : interactions) {
      base::TimeDelta total = strategy->GetTotalLatency(latency);
      InsertSorted(worst_ten_latencies_, total, latency);
    }
  };

  // Dispatch processing logic
  if (max_event_durations.is_detailed_interactions()) {
    processor(max_event_durations.get_detailed_interactions());
  } else {
    // Convert old data format
    for (const auto& legacy_latency : 
         max_event_durations.get_legacy_durations()) {
      InteractionLatency converted;
      converted->subparts->total_duration = legacy_latency;
      processor({converted});
    }
  }

  // Maintain a maximum of 10 elements
  if (worst_ten_latencies_.size() > 10) {
    worst_ten_latencies_.resize(10);
  }
}

// New private method
InteractionLatency ResponsivenessMetricsNormalization::ApplyLatencySelectionStrategy(
    const InteractionLatency& latency) const {
  InteractionLatency result = latency.Clone();
  
  // When using new data format, write total latency to the compatibility field
  if (latency->subparts.is_valid()) {
    result->interaction_latency = 
        latency->subparts->input_delay +
        latency->subparts->processing_time +
        latency->subparts->presentation_delay;
  }
  
  return result;
}

void ResponsivenessMetricsNormalization::InsertSorted(
    std::vector<InteractionLatency>& container,
    base::TimeDelta new_latency,
    const InteractionLatency& raw_data) {
  auto it = std::lower_bound(container.begin(), container.end(), new_latency,
      [](const auto& a, const auto& b) { 
          return GetTotalLatency(a) > GetTotalLatency(b); 
      });
  container.insert(it, raw_data.Clone());
}

My modification approach :

Pasted%20image%2020250407112902

Complementary Test Cases:

(some objects and function such as MakeInteraction , HybridDataProcessing is defined by me. Due to limitations in article length, their definitions are omitted here 🥲)

// components/page_load_metrics/browser/responsiveness_metrics_normalization_unittest.cc

TEST_F(ResponsivenessMetricsNormalizationTest, HybridDataProcessing) {
  mojom::UserInteractionLatencies mixed_data;
  
  // Add mixed old and new format data
  mixed_data.set_detailed_interactions({
      MakeInteraction(50ms, 30ms, 20ms),  // Total 100ms
      MakeInteraction(80ms, 10ms, 10ms)    // Total 100ms
  });
  mixed_data.set_legacy_durations({base::Milliseconds(90)});

  normalization_->AddNewUserInteractionLatencies(3, mixed_data);
  
  EXPECT_EQ(3, normalization_->num_user_interactions());
  ASSERT_EQ(3, normalization_->worst_ten_latencies().size());
  
  // Verify old data conversion
  auto& converted = normalization_->worst_ten_latencies()[2];
  EXPECT_EQ(90, converted->subparts->total_duration.InMilliseconds());
  EXPECT_NEAR(72, converted->subparts->input_delay.InMilliseconds(), 5);
}

Affected Modules & Files

Blink renderer core

blink/renderer/core/timing/window_performance.cc
blink/renderer/core/timing/responsiveness_metrics.cc

metrics_sender

components/page_load_metrics/renderer/page_timing_metrics_sender.cc
components/page_load_metrics/common/page_load_metrics.mojom

UKM

chrome/browser/page_load_metrics/observers/core/ukm_page_load_metrics_observer.cc
components/page_load_metrics/browser/responsiveness_metrics_normalization.cc
tools/metrics/ukm/ukm.xml

Development Process

Phase	Weeks	Deliverables	Risk Control
Environment Setup & Prototype Verification	2	Chromium debugging environment, PoC for sub-dimension data collection	Accelerate compilation using GCP instances
Mojo Protocol Modification	3	Extended IPC interfaces, modifications to rendering process	Submit incremental CLs (Change Lists) and get code reviews in time
UKM Integration	2	New UKM metrics reporting, update ukm.xml	Synchronously update validation rules of ukm.xml
Test Suite Development	2	Web Platform Tests/WPT cases, unit test coverage	Utilize Chromium test framework
Performance Regression Testing	1	Benchmark reports, memory usage analysis	Use Telemetry performance testing framework
Documentation & Wrap-up	2	Design documentation, CrUX integration plan	Reserve buffer time

My Contributions to Other Open Source Projects

VSCode

Scrollbar for File menu is displaying over Open Recent by g122622 · Pull Request #236998 · microsoft/vscode
- Fix a common issue of the cascading system in VSCode. This issue is not limited to the “Open Recent” menu; it is a common problem for all submenus that include a vertical scrollbar. The root cause of this problem lies in the flawed management of cascading relationships in VSCode.
This PR Fixes Issue #243134 : Chat Item Display Issue in VS Code Copilot When HTML Tags Start the Input by g122622 · Pull Request #243135 · microsoft/vscode
- Fix a bug related to chat components.

Bytenode

https://github.com/microsoft/vscode/pull/243135
- Fix a bug related to the build artifacts.

About Me

I’m a sophomore studying Computer Science at Chongqing University, with a strong interest in browser internals. As a heavy GitHub user, I’m deeply familiar with collaborative workflows on the platform, including issue tracking, pull requests, and code reviews. I enjoy writing clean, maintainable code and have solid experience with design patterns and object-oriented programming.

Recently, I’ve been diving into the Chromium codebase, focusing on its architecture and core components. Through hands-on exploration, I’ve gained practical knowledge of the Mojo IPC framework and learned to utilize Web Performance APIs to analyze and optimize rendering pipelines. To solidify my understanding, I’ve built small experimental modules that interact with Chromium’s internals, though these are still works in progress.

My journey started when I became curious about how browsers render web pages efficiently. Books like How Browsers Work and Chromium’s official documentation became my go-to resources for self-learning. I’ve also documented some of my explorations through technical blog posts (written in Chinese), including an analysis of Chromium’s multi-process architecture and a tutorial on measuring page load metrics using PerformanceObserver.